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Abstract 

A Fortran-77 program for calculating test statistics to compare weighted 
histogram with an unweighted histogram and two histograms with weighted 
entries is presented. The code calculates test statistics for cases of histograms 
with normalized weights of events and unnormalized weights of events. 
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Program Title: CHICOM 
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Programming language: Fortran-77 

Computer: Any Unix/Linux workstation or PC with a Fortran-77 compiler. 
Classification: 4.13, 11.9, 16.4, 19.4 

External routines/libraries used: FPLSOR (M103) [l] and BRENT fi] 
Nature of problem: The program calculates test statistics for comparing two 
weighted histograms and an unweighted histogram with a weighted one. 
Solution method: Calculation of test statistics is done according formulas presented 
inRef. 0. 

Running time: 0.001 sec for 5 bins histogram. 
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1. Introduction 

A histogram with m bins for a given probability density function p{x) is 
used to estimate the probabilities Pi that a random event belongs in bin i: 

Pi = p{x)dx, i = 1, . . . ,m. (1) 

Integration in ([T]) is carried out over the bin Si and YIT Pi = 1- A histogram 
can be obtained as a result of a random experiment with the probability 
density function p{x). 

A frequently used technique in data analysis is the comparison of two dis- 
tributions through the comparison of histograms. The hypothesis of homo- 
geneity is that the two histograms represent random values with identical 
distributions. It is equivalent to there existing m constants pi,...,pm, such 
that Yl^iPi = the probability of belonging to the ith bin for some 

measured value in both experiments is equal to pt. 

Let us denote the number of random events belonging to the ith bin of the 
first and second histograms as riu and n2i, respectively. The total number of 
events in the histograms are equal to Uj = Yl^i ^jii where j = 1, 2. 

As shown in [l[ the statistic 

1=1 

has approximately a xL-i distribution if hypothesis of homogeneity is valid. 

Weighted histograms are often obtained as a result of Monte-Carlo simu- 
lations. References ji], 0, 3| are examples of research on high-energy physics, 
statistical mechanics, and astrophysics using such histograms. 



2 



To define a weighted histogram let us write the probabihty Pi ([T]) for a 
given probability density function p{x) in the form 



Pi = 



I p{x)dx 




w{x)g{x)dx, 



(3) 



where 



w{x) = p{x)/g{x) 



(4) 



is the weight function and g{x) is some other probability density function. 
The function g{x) must be > for points x, where p{x) ^ 0. The weight 
w{x) = if p{x) = 0, see Ref. Because of the condition J2iPi — ^ further 
we will call the above defined weights normalized weights as opposed to the 
unnormalized weights w{x) which are w{x) = const ■ w{x). 

The histogram with normalized weights was obtained from a random 
experiment with a probability density function g{x), and the weights of the 
events were calculated according to (jl]). Let us denote the total sum of the 
weights of the events in the ith bin of the histogram with normalized weights 
as 



where rii is the number of events at bin i and Wi{l) is the weight of the Ith 
event in the ith bin. The total number of events in the histogram is equal 
to n = where m is the number of bins. The quantity Pi = Wi/n 

is the estimator of Pi with the expectation value E [pi] = pi. Note that in 
the case where g{x) = p{x), the weights of the events are equal to 1 and the 
histogram with normalized weights is the usual histogram with unweighted 
entries. 

Let us introduce notations need for the description of tests for comparing 
histograms: 

• Wji = ^r=i'^i«(0 ^ the total sum of the weights of the events in the 
ith bin of the jth the histogram with normalized weights; 

• ^ji ~ YM^^i'^jii^)/ YM=i'^'jS) ^ estimator of the ratio of moments in 
the ith bin of the jth histogram with normalized weights. 




(5) 



1=1 
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And the same quantities we introduce for the histograms with unnormahzed 
weighted entries: 



Er=i*..(o/Er=i*i.(o 



Notice that Wji = riji and rji = 1 for histograms with unweighted entries. 

Three types of statistics used for comparing histograms are presented at 
Ref [61. 

Histograms with normalized weighted entries. 

Let us introduce the statistic 

with the sums in (|6]) extending over all bins i except one bin k. In the equation 
(ini), the probabilities pi are unknown, and estimators pi of the probabilities 
are found by minimization of (E]). We denote by iX| the value of iXl after 
substitution of the estimators pi into ([6]). As shown in the statistic 

iX2 = Med{iX2, (7) 

approximately has a Xm-i distribution if the hypothesis of homogeneity is 
valid. 

Histograms with unnormalized weighted entries. 

Let us introduce the statistic 

^^ = E^ + 2E^^. (8) 



2 

j=i 1 j=i 



where 



= ^^'P' rjiWf./pi - J2 rjiWji. (9) 

y iy^k i^k i^k 

Again estimators pi of unknown probabilities Pi are found by minimization of 
dH]). We denote by value of after substitution of the estimators 

pi into ([8]). As shown in [gj, the statistic 

= Med {2X1 2X1 2XI} (10) 
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approximately has a Xm-2 distribution if the hypothesis of homogeneity is 
vahd. 

Histograms with normalized and unnormalized weighted entries. 

Let us introduce the statistic 

^ 1 ^ ImM , 1 - Sy^ -"'^')' _ H- £la H- 2..,. (11, 

We denote by 3XI the value of after substitution of the estimators pi 
into (ITT]) . As shown in 0], the statistic 

3X2 = Med{3X2, 3X2^...,3X^} (12) 

approximately has a Xm-2 distribution if the hypothesis of homogeneity is 
valid. 

The chi-square approximation is asymptotic. This means that the critical 
values may not be valid if the expected frequencies are too small. The use 
of the chi-square test is inappropriate if any expected frequency is < 1, or 
if the expected frequency is < 5 in > 20% of the bins for either histogram. 
This restriction observed in the usual chi-square test [tI] is quite reasonable 
for the proposed test. 

Information for readers. Recently, another paper dedicated to weighted 
histograms has been published in "Computer Physics Communication", see 
Ref. [9|. The same author has presented a program for goodness of fit test for 
histograms with weighted and unweighted entries. The test is used in a data 
analysis for comparison theoretical frequencies with frequencies represented 
by histogram. 

2. Computer program 

CHICOM is a subroutine which can be called from the Fortran programs 
for calculating test statistics iX"^, and sX^. 

Usage 

CALL CHICOM (AEX , ERAEX , NEV , AMC , ERAMC , NMC , NCHA , MODE , STAT , NDF , IFAIL) 
Input Data 
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AEX - one dimensional real array of first weighted histogram content 

ERAEX - one dimensional real array of histogram content for entries of first 
histogram with squares of weights. 

NEV - number of events in the first histogram rii 

AMC - one dimensional real array of second weighted histogram content 

ERAMC - one dimensional real array of histogram content for entries of sec- 
ond histogram with squares of weights. 

NMC - number of events in the second histogram 77-2 

NCHA - number of bins m 

MODE - equal 1 for both histograms with normalized weights, equal 2 for 
both histograms with unnormalized weights equal 3 for first histogram with 
normalized weights and the second with unnormalized weights 

Output data 

STAT - test statistic 

NDF - number of degree of freedom I of the xf distribution if hypothesis Hq 
is true (will be / = m — 1 or / = m — 2) 

IFAIL - will be > if calculation is not successful. 



3. Test run 

We take a distribution: 

P^^^ (X - 10)^ + 1 + (X - ily + 1 

defined on the interval [4, 16] and representing two so-called Breit-Wigner 
peaks [sj. Three cases of the probability density function g(x) are considered 
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9i{x)=p{x) (14) 



92[X) 



1/12 



(15) 



^^^"^ ix-9y + i + (. - 15)^ + 1 

Distribution gi{x) ffT^ results in a histogram with unweighted entries, 
while distribution g2{x) f llSp is a uniform distribution on the interval [4, 16]. 
Distribution (73 (x) f ll6p has the same form of parametrization as p{x) fll3p . 
but with different values for the parameters. 

Three cases were considered: 





First histogram 


Second histogram 




type of weight 


weight 


type of weight 


weight 


1 

2 
3 


normalized 

unnormalized 

normalized 


p{x)/gi{x) = 1 
0.5p{x)/g2ix) 
p{x)/gi{x) = 1 


normalized 

unnormalized 

unnormalized 


p{x)/gi{x) = 1 

2p(a;)/fi'3(a;) 

0.5p{x)/g3{x) 



For each case histograms with 5 bins were created by simulation 500 en- 
tries for first histogram and 1000 entries for the second one. The results of 
the calculations are presented below. 

Test 1 

INPUT 

AEX 11.0000 58.0000 234.0000 102.0000 95.0000 

ERAEX 11.0000 58.0000 234.0000 102.0000 95.0000 
NEV 500 

AMC 30.0000 119.0000 439.0000 182.0000 230.0000 

ERAMC 30.0000 119.0000 439.0000 182.0000 230.0000 
NMC 1000 
NCHA 5 
MODE 1 
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OUTPUT 



STAT 4.7391 (p-value = 0.3151) 

NDF 4 
IFAIL 

Test 2 

INPUT 

AEX 9.3018 22.8871 122.0670 51.6786 46.2622 

ERAEX 0.8026 7.7173 142.7876 27.7087 28.5724 
NEV 500 

AMC 68.9455 213.5029 898.8528 397.7258 419.0171 

ERAMC 108.3022 229.3163 3697.7102 1455.0262 699.6888 
NMC 1000 
NCHA 5 
MODE 2 

OUTPUT 

STAT 1.9111 (p-value = 0.5911) 

NDF 3 
IFAIL 

Test 3 

INPUT 

AEX 17.0000 53.0000 225.0000 101.0000 104.0000 

ERAEX 17.0000 53.0000 225.0000 101.0000 104.0000 
NEV 500 

AMC 14.2303 53.9921 204.9794 111.6337 101.1128 

ERAMC 5.4897 14.5935 198.6223 103.7259 40.9275 
NMC 1000 
NCHA 5 
MODE 3 

OUTPUT 
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STAT 1.4431 (p- value = 0.6955) 

NDF 3 
IFAIL 
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